A diagnostic model for overweight and obesity from untargeted urine metabolomics of soldiers

Soldiers in active military service need optimal physical fitness for successfully carrying out their operations. Therefore, their health status is regularly checked by army doctors. These inspections include physical parameters such as the body-mass index (BMI), functional tests, and biochemical studies. If a medical exam reveals an individual’s excess weight, further examinations are made, and corrective actions for weight lowering are initiated. The collection of urine is non-invasive and therefore attractive for frequent metabolic screening. We compared the chemical profiles of urinary samples of 146 normal weight, excess weight, and obese soldiers of the Mexican Army, using untargeted metabolomics with liquid chromatography coupled to high-resolution mass spectrometry (LC-MS). In combination with data mining, statistical and metabolic pathway analyses suggest increased S-adenosyl-L-methionine (SAM) levels and changes of amino acid metabolites as important variables for overfeeding. We will use these potential biomarkers for the ongoing metabolic monitoring of soldiers in active service. In addition, after validation of our results, we will develop biochemical screening tests that are also suitable for civil applications.


INTRODUCTION
Many professionals require a certain level of physical fitness for their work, particularly first-line responders such as firefighters, paramedics, and military personnel. To ensure their operability, they require, in addition to training, good eating habits and periodic review of their health status.
Overweight and obesity are present in most populations and are the origin of numerous metabolic diseases (Kaplan, 1989;Tchernof & Després, 2013;Cirulli et al., 2019). The World Health Organization (WHO) recognizes obesity as a global epidemic (James, 2008).
In Mexico, the prevalence of overweight and obesity is dramatically high at about 75% (Instituto Nacional de Salud Pública (MX), 2018). Thus, the Mexican official standard NOM-008-SSA3-2010 for the comprehensive management of obesity defines obesity as a public health problem in Mexico due to its magnitude and impact. Criteria for health management should support the early detection, prevention, comprehensive treatment, and control of the growing number of patients (Secretaría de Gobernación (MX), 2010).
Soldiers of the Mexican Army have regular exams of their health state by a military doctor. Since overweight and obese soldiers could present risks for their own health and missions, mainly in the special bodies such as paratroopers, they are sent to lose weight in particular training camps such as the ''Center for improving lifestyle and health'' in Mexico City. Furthermore, the social security institute's law for the Mexican Armed Forces considers soldiers with a Body Mass Index (BMI) greater than 30 as incapable of active service (Cámara de Diputados (MX), 2019 ). This medical assessment of the soldiers measures vital signs, weight, height, calculating the BMI, clinical history, and a meticulous clinical examination of the body's apparatus and systems. Additional laboratory and cabinet studies are indicated if the doctor identifies alterations or abnormalities in these clinical analyses. All these studies could reveal possible diseases. However, for the case of overweight and obesity, the diagnosis is currently only based on the calculation of the BMI without considering important aspects such as the patient's physiological and metabolic status.
Metabolites in body fluids can be analyzed to assess the nutrition and endogenous changes associated with overweight and obesity, using techniques such as nuclear magnetic resonance (NMR) and mass spectrometry (MS) (Xie, Waters & Schirra, 2012;Zhang, Sun & Wang, 2013). Usually, invasive studies such as blood analyses explore the patients' metabolic changes and monitor corrective actions. On the other hand, non-invasive tests are generally limited to phenotypic measurements such as body mass index.
Analyzing urine would be more convenient for patients and provide information on the metabolism and pathways involved in particular conditions (Braga, 2017). Urine is a biofluid that contains different molecules generated by the organism's metabolism that must be eliminated and represents an excellent source of human sample material because it is available non-invasively. Typically, various molecules are altered simultaneously in diseased people (Bruzzone et al., 2021).
Artificial intelligence and machine learning algorithms can support medical diagnosis (Hatwell, Gaber & Azad, 2020). Classification is the most widely implemented machine learning task in the medical sector, employing, for example, the Adaptive Boost algorithm (Freund, 2001). Adaptive Boost pre-processing also helps to select the most important features automatically from high dimensional data and decision trees (Rangini & Jiji, 2013).
This study used untargeted metabolomics based on mass spectrometry to analyze urine from military personnel with normal and excess weight (overweight and obesity). Using Ada Boost data mining, we created a classification model and identified possible biomarkers for monitoring the metabolic state of soldiers and the early diagnosis of deviations.

Participants and sample preparation
Participants were recruited from the Military Medical Sciences Center, Mexico City, Mexico. Inclusion criteria were: both sexes, active military service, and signed consent to participate voluntarily. Participants answered a questionnaire to identify risk factors for obesity; the next day, nutritional status was assessed by bioelectrical impedance.
The Body-Mass-Index (BMI) was calculated using Eq. (1), according to the WHO definition (World Health Organization (WHO), 2021): with the person's weight measured in kilograms (kg) and the person's height in meters (m). Following the WHO system, soldiers with a BMI equal to or higher than 25 were classified as 'overweight,' and those with a BMI equal to or above 30 as 'obese' (World Health Organization (WHO), 2021).
The first urine of the day was collected at 6 am, and the samples were frozen at −60 • C until their processing. Urine samples were thawed and centrifuged at 850 g for 5 min for metabolomics analysis. Ten L of each sample were diluted in 90 L of chromatography-mass spectrometry (LC-MS) grade water (1:9 v/v) and transferred to vials for UPLC-MS analysis.

Untargeted metabolomics by HPLC-MS
LC-MS grade acetonitrile, water, and acetic acid were purchased from JT Baker (Brick Town, NJ, USA). Samples were analyzed with a Dionex UltiMate 3000 HPLC (Thermo Scientific, Waltham, MA, USA) coupled to an Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Scientific) with an electrospray ionization source. We used an AccuCore C18 column (4.6 × 150 mm, 2.6 m) to separate metabolites using a binary gradient elution of solvents A and B, similar to the method described by López-Hernández et al. (2019). In short, the mobile phase was A: 0.5% acetic acid in water; B: 0.5% acetic acid in acetonitrile. The mobile phase was delivered at a flow rate of 0.5 mL/min, initially with 1% B, followed by a linear gradient to 15% B over 3 min. Solvent B was increased to 50% within 3 min. Over the next 4 min, the gradient was ramped up to 90% B with a plateau for 2 min. The amount of B was then decreased to 50% in 2 min. 2 min later, the solvent B was lowered to 15%, and finally, solvent B returned to initial conditions(1%) until the end of the chromatographic run (18 min). The column temperature was controlled at 40 • C. The injection volume was 20 L.
Data were acquired in positive electrospray ionization (ESI+) mode with the capillary voltage set to 3.5 kV, the Ion Transfer Tube Temperature to 350 • C, and Vaporizer Temp to 400 • C. The desolvation gas was nitrogen with a flow rate of 50 UA (arbitrary units). The detector type was Orbitrap at a resolution of 120,000. Data were acquired from 50-2,000 m/z in Full Scan mode with an AGC target of 2.0E5. Before the analysis, the mass spectrometer was calibrated with LTQ ESI Positive Ion Calibration Solution (Pierce, Thermo Scientific).

Processing of mzML files with KNIME
For mass spectrometry raw data processing and generation of an aligned feature matrix, we employed the OpenMS nodes (Sturm et al., 2008;Pfeuffer et al., 2017;Röst et al., 2016) of the KNIME Analytics Platform (https://www.knime.com) (Berthold et al., 2009;Alka et al., 2020). Figure 1 represents the KNIME workflow for the raw data processing and matrix generation. The exact parameters of each step are documented in the workflow.knime workflow file, provided as Supplementary Files at Zenodo (see 'Data Availability' statement below). For preparing the resulting table of aligned features for the MetaboAnalyst Web Server (Xia et al., 2009), we edited the .CSV file with vim (https://www.vim.org/), using the CSV vim plugin (<chrisbra/csv.vim>).

Statistical analyses with MetaboAnalyst
For metabolic classification models, we used the web-based version of MetaboAnalyst (https://www.metaboanalyst.ca/) (Xia et al., 2009;Chong, Yamamoto & Xia, 2019;Wishart, 2020). We applied the one-factor statistical analysis for peak intensities in a plain text file, with unpaired samples in columns.
The MetaboAnalyst report for the uploaded data is provided as a Supplemental File. First, we filtered the raw data by the interquartile range (IQR), normalized it by the median, and applied a square root transformation. Further, we used auto-scaling, i.e., the values were mean-centered and divided by the standard deviation of each variable.

Metabolic pathway enrichment and metabolite identification
For identifying metabolic pathway enrichment and likely involved metabolites, we used the Functional Analysis (MS peaks) tool of MetaboAnalyst (Li et al., 2013). We specified a mass search against the Human Metabolome Database (HMDB, https://hmdb.ca) (Wishart et al., 2018;Wishart et al., 2022), with 10 ppm mass tolerance in positive mode. We filtered raw data by the interquartile range (IQR), normalized by the median, and applied a square root transformation. Further, we used auto-scaling, i.e., the values were mean-centered and divided by the standard deviation of each variable (the same data preparation as for statistics above). For the Mummichog algorithm, we set a p-value cutoff of 0.25 (default top: 10% peaks). We used the pathway library of Homo sapiens MFN pathway/metabolite sets (a meta library) with at least five entries. The chemical structure and function of metabolites and the identifications from the Mummichog analysis were searched in the KEGG database (https://www.genome.jp/kegg/ compound/) (Kanehisa et al., 2014), BiGG (http://bigg.ucsd.edu/universal/metabolites/) (King et al., 2016), the Edinburgh human metabolic network reconstruction (Ma et al., 2007) and the above-mentioned HMDB. Table 1 summarizes statistical data of the 153 participants. Of the 67 women and 86 men, 66 presented normal weight, 62 had overweight, and 25 were obese. Comparing female and male soldiers, the latter exhibited a higher prevalence of overweight and obesity. As expected, the groups with higher BMI also presented a higher body fat content, suggesting metabolic differences between these groups. Figure 2 shows the number of features in the different sample groups and blank samples. We removed data sets of presumably empty samples and technical outliers by comparing the number of features with blank injections and eliminating all analyses with less than 4,000 features.

Urinary metabolomics raw data processing and filtering
After clean-up, 52 samples of healthy, 47 overweight, and 21 obese individuals were left. We used these 120 data sets for further analysis. The healthy group showed 5,717 to 9,657, the overweight group 5,559 to 10,447, and the obese group 5,575 to 9,436 features.

Identification of metabolic identities with MetaboAnalyst
First, we applied a cluster analysis with the sparse PLS-DA (sPLS-DA) algorithm (Lê Cao, Boitard & Besse, 2011), which indicates distinct metabolic identities of healthy, overweight, and obese individuals. However, the clustering is far from perfect, and especially the group of overweight individuals does not separate well from the other groups (Fig. 3A). We discussed the difficulty of clustering metabolic data in an earlier paper (Winkler, 2015).  To test if we could distinguish between healthy participants and others, we joined the overweight and obese groups and applied an orthogonal projection to latent structures data analysis (OPLS-DA) (Trygg & Wold, 2002). As a result, two clusters were separated reasonably well, (1) samples of healthy individuals and (2) samples of overweight and obese soldiers (Fig. 3B).
The classification is imperfect; however, the graphics represent the medical situation of clearly healthy, obviously sick, and patients in transition. Consequently, we can discriminate between two metabolic identities of normal-weight and overweight/obese soldiers.

Statistical analysis of fold-changes
Using the same parameters for uploading the data (see 'Methods'), but only defining two groups, i.e., healthy and obese-overweight, we created the Volcano plot shown in Fig. 4. We did this analysis in the one-factor statistical analysis module of MetaboAnalyst. We defined non-parametric Wilcoxon rank-sum tests, a fold-change of 1.3 and a p-value threshold of 0.1 (raw), with equal group variance. Two hundred twenty-five significant differential variables were detected and subjected to an Adaptive Boost data mining analysis.
Consequently, the classification between healthy and obese-overweight persons based on urinary metabolomics profiles is highly reliable, considering natural variations.
The important variables that contribute most to correct classification are shown in Fig. 5. Table 3 lists important variables from the Ada Boost analysis with at least a 1.3-fold significant change. Those ions are possible biomarkers for weight-related metabolic studies.

Mummichog analysis: metabolic pathway enrichment
To explore affected metabolic pathways and facilitate the identification of metabolites, we performed a Mummichog analysis in MetaboAnalyst (see 'Methods').
As indicated in Table 4 and Fig. 6, five pathways demonstrated enrichment above the defined threshold limits: • Urea cycle/amino group metabolism • Alanine and aspartate metabolism • Drug metabolism-cytochrome P450 • Aspartate and asparagine metabolism • Ubiquinone biosynthesis. Especially the appearance of urea cycle/amino group metabolism as the first hit gives confidence to the Mummichog algorithm since no information about the origin of the samples was given to the MetaboAnalyst platform.

Albores
Thus, ions assigned to metabolites of enriched pathways have increased confidence in our further discussion.

Classification of normal weight vs. overweight-obese, based on metabolic signature
To develop a predictive classification model, we used the untargeted LC-MS features with at least a 1.3-fold change. The features correspond to ions with a particular retention time. Although a 30% increased or decreased metabolite level might not be critical for health, it can indicate a disturbed pathway.
Identifying compounds corresponding to the features is theoretically possible. However, the reliable assignment of metabolites is tedious (Rathahao-Paris et al., 2015;Jeffryes et al., 2015;Fuente et al., 2019;Djoumbou-Feunang et al., 2019;Dührkop et al., 2019), and the data mining models are helpful without knowing the related compounds (Winkler, 2015). Thus, we limited the identification of compounds to important variables. The OPLS-DA analysis already indicated distinct metabolic identities (Fig. 3B) for normal weight and overweight-obese individuals. A predictive model that we developed with the Adaptive Boost algorithm was able to classify normal weight and overweight-obese individuals with an overall error of 5.5% (Table 2). Notably, the highest errors were found in the validation and testing data of healthy soldiers wrongly classified as overweight or obese. These assignments could indicate a possible tendency of the soldiers to gain weight. The Adaptive Boost model demonstrates metabolic differences between normal weight and overweight-obese individuals, which can be used for classification. Further, the     Adaptive Boost could provide a sensitive method to estimate the metabolic state and the tendency of a person to gain weight. However, additional studies are necessary to evaluate the performance of Adaptive Boost models with untargeted metabolic data as a predictive tool in clinical diagnostics and treatment.

Metabolic pathways in obesity-overweight and potential biomarkers
Compiling the biomarker candidate ions with likely metabolite identifications resulted in Fig. 7. Several ions and the metabolic pathway integration-derived metabolites hint at Sadenosyl-L-methionine (SAM). A previous study reported a 42% increase of SAM in the serum of test persons who were overfed by 1,250 kcal per day and gained weight above the median (Elshorbagy et al., 2016). SAM is synthesized from methionine and ATP and is a key metabolite since it donates methyl groups to different molecules, such as DNA, RNA, proteins, and lipids, in enzymatic reactions. The demethylated S-adenosylhomocysteine (SAH) is hydroxylated by adenosylhomocysteinase, resulting in adenosine and homocysteine. Methionine synthase builds methionine by transferring a methyl group from 5-methyl-tetrahydrofolate to homocysteine (Finkelstein, 2000).
Several of these reactions have been reported to be altered in obesity. For example, high serum levels of homocysteine have been correlated with reduced high-density lipoprotein (HDL) levels. The accumulation of homocysteine comes with lower SAM and SAH levels, leading to a diminished production of phosphatidylcholine, which is essential for the production of low-density lipoproteins (LDL) and very-low-density lipoproteins (VLDL) (Obeid & Herrmann, 2009). Hyperlipidemia with increased serum homocysteine increases the risk of developing an atherosclerotic disease in overweight patients (Glueck et al., 1995).
In addition, elevated serum homocysteine is related to hepatic steatosis. The later effect was pronounced with low folate intake (Gulsen et al., 2005). Strikingly, we also found the folate metabolism affected in our present study.
Another altered SAM-related pathway, we detected, is related to nicotinamide metabolism. Nicotinamide-N-methyl transferase (NNMT) methylates nicotinamide, using SAM as a methyl donor (Ramsden et al., 2017). As a result, NNMT is enriched in adipose tissue and the liver of patients with obesity and type 2 diabetes mellitus (DM2) (Kraus et al., 2014).
The possibility of detecting excess food energy intake in urine by measuring SAM would provide a non-invasive method for monitoring patients during weight-loss diets and professionals who require high physical fitness, such as soldiers. Thus, the level of SAM will be assayed in the following study during the treatment of obese military personnel.
In addition, several ions that putatively correspond to compounds from amino acid metabolism were identified. Changes in amino acid levels and related metabolites in obese patients have been reported in several studies (Xie, Waters & Schirra, 2012;Maltais-Payette et al., 2018;Yu et al., 2018). Therefore, our finding is expectable. However, since we found the alteration of amino acid pathways through a variable importance analysis of untargeted metabolomics data, we suggest a high relevance of amino acid-related biomarkers compared to other groups of compounds such as TCA-cycle metabolites. Therefore, besides the SAM level, we will investigate the role of amino acid metabolism in obesity and weight reduction in future studies.

CONCLUSIONS
An Ada Boost model based on urinary metabolomics data could discriminate obese and overweight from healthy military personnel with a low overall error rate of 5.5%, indicating a metabolic signature related to the excessive ingestion of food.
Important variables from data mining, statistical analyses, and metabolic pathway enrichment analysis suggest S-adenosyl-methionine (SAM) as a possible urine biomarker for overfeeding. Increased SAM levels were found for overfed people in plasma, but monitoring SAM in urine could be used daily for close follow-up of patients, for example, in the treatment of losing weight or persons that need a high level of physical fitness, such as soldiers.
As well, the amino acid metabolism showed significant changes. Therefore, in ongoing studies, we include SAM, amino acid metabolism compounds, and acylcarnitines for evaluating the metabolic state of military personnel. In the future, our results will support the design of low-cost biochemical assays for the broad public.